-
Notifications
You must be signed in to change notification settings - Fork 423
[no-relnote] Add E2E for containerd #1313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive E2E testing for containerd runtime configuration alongside existing Docker testing infrastructure. The changes introduce a nested container testing framework that allows running tests inside containers to validate NVIDIA Container Toolkit behavior in containerized environments.
- Adds new E2E tests for containerd drop-in configuration functionality
- Introduces nvidia-cdi-refresh systemd unit testing
- Implements nested container runner infrastructure for isolated testing
Reviewed Changes
Copilot reviewed 9 out of 32 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/go.mod | Adds new dependencies for UUID generation and test utilities |
tests/e2e/runner.go | Implements nested container runner with Docker installation and CTK setup |
tests/e2e/nvidia-ctk_containerd_test.go | New comprehensive containerd E2E test suite |
tests/e2e/nvidia-ctk_docker_test.go | Refactors to use shared runner infrastructure and fixes macOS compatibility |
tests/e2e/nvidia-cdi-refresh_test.go | New systemd unit tests for CDI refresh functionality |
tests/e2e/nvidia-container-cli_test.go | Refactors to use nested container runner |
tests/e2e/installer.go | Adds containerd installation template and additional flags support |
tests/e2e/e2e_test.go | Centralizes test runner initialization in BeforeSuite |
tests/e2e/Makefile | Documents new test categories |
Pull Request Test Coverage Report for Build 18005738357Details
💛 - Coveralls |
5c85056
to
3dc480c
Compare
I'll mark this PR as ready for review once #1235 is merged |
3dc480c
to
563a192
Compare
Rebased |
029af03
to
1899001
Compare
51ad031
to
c65e468
Compare
Rebased |
AfterAll(func(ctx context.Context) { | ||
// Cleanup: remove the container and the temporary script on the host. | ||
// Use || true to ensure cleanup doesn't fail the test | ||
runner.Run(fmt.Sprintf("docker rm -f %s 2>/dev/null || true", containerName)) //nolint:errcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the nolint let's just drop the return values.
runner.Run(fmt.Sprintf("docker rm -f %s 2>/dev/null || true", containerName)) //nolint:errcheck | |
_, _, _ = runner.Run(fmt.Sprintf("docker rm -f %s 2>/dev/null || true", containerName)) |
Does it mak sense to at least WARN if the cleanup fails? The || true
doesn't ensure that the test doesn't fail, the fact that we don't check the return value does that.
# Remove any imports line from the config (reset to original state) | ||
if [ -f /etc/containerd/config.toml ]; then | ||
grep -v "^imports = " /etc/containerd/config.toml > /tmp/config.toml.tmp && mv /tmp/config.toml.tmp /etc/containerd/config.toml || true | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make a copy of the original config and restore that after / before each test?
|
||
# Restart containerd to pick up the clean config | ||
systemctl restart containerd | ||
sleep 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to check containerd health?
output, _, err := nestedContainerRunner.Run(`cat /etc/containerd/conf.d/99-nvidia.toml`) | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(output).To(ContainSubstring(`nvidia`)) | ||
Expect(output).To(ContainSubstring(`nvidia-cdi`)) | ||
Expect(output).To(ContainSubstring(`nvidia-legacy`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in person, we are nolonger triggering the configuration of containerd with the current installation mechanism.
output, _, err = nestedContainerRunner.Run(`containerd config dump`) | ||
Expect(err).ToNot(HaveOccurred()) | ||
// Verify imports section is in the merged config | ||
Expect(output).To(ContainSubstring(`imports = ['/etc/containerd/conf.d/*.toml']`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that config dump
prints that ACTUAL paths of all files processed.
ContainSubstring(`default_runtime_name = "nvidia"`), | ||
ContainSubstring(`default_runtime_name = 'nvidia'`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should definitely be VERSION specific checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having tests that pass in multiple states are prone to be flaky. Please pick the format for the version of containerd that we have installed by default and use that. This makes the differences between SPECIFIC containerd versions more obvious when reading the tests.
ContainSubstring(`default_runtime_name = "nvidia"`), | ||
ContainSubstring(`default_runtime_name = 'nvidia'`), | ||
)) | ||
Expect(output).To(ContainSubstring(`enable_cdi = true`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we toggle this behaviour? It is disabled by default.
ContainSubstring(`[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]`), | ||
ContainSubstring(`[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.nvidia]`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, thsi should be version-specific.
}) | ||
|
||
When("containerd already has a custom default runtime configured", func() { | ||
It("should preserve the existing default runtime when --set-as-default=false is specified", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--set-as-default=false
is not specified. It is the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we don't strictly speaking need a custom runtime to test the behaviour of NOT overriding the default.
`) | ||
Expect(err).ToNot(HaveOccurred()) | ||
|
||
// Configure containerd with drop-in config (explicitly not setting as default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we configuring with the drop-in config in this case?
}) | ||
|
||
When("containerd has multiple custom runtimes and plugins configured", func() { | ||
It("should add NVIDIA runtime alongside existing runtimes like kata", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of interest, how is this different to an arbitrarry "custom" runtime?
Expect(err).ToNot(HaveOccurred()) | ||
|
||
// Verify kata runtime was added | ||
output, _, err := nestedContainerRunner.Run(`systemctl restart containerd && sleep 2 && containerd config dump | grep -A5 kata`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the -A5
? Also. Please split the different steps.
Expect(err).ToNot(HaveOccurred()) | ||
Expect(output).To(ContainSubstring(`kata`)) | ||
|
||
// Configure containerd with drop-in config and set NVIDIA as default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we setting these options?
_, _, err := nestedContainerRunner.Run(` | ||
rm -f /etc/containerd/config.toml | ||
rm -rf /etc/containerd/conf.d | ||
mkdir -p /etc/containerd/conf.d | ||
|
||
cat > /etc/containerd/config.toml <<'EOF' | ||
version = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we ensure that this is only run on containerd versions that support it?
|
||
# Create a custom config that will be imported | ||
# Use the correct plugin path for containerd v2/v3 | ||
cat > /etc/containerd/custom.d/10-custom.toml <<'EOF' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the correct path was: /etc/containerd/conf.d
? https://github.com/containerd/containerd/pull/12323/files
Expect(err).ToNot(HaveOccurred()) | ||
|
||
// Verify containerd can load the custom import before installer | ||
_, _, err = nestedContainerRunner.Run(`systemctl restart containerd && sleep 2 && containerd config dump | grep -i myregistry`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we randomly checking for myregistry
? Is this sufficient?
// Run a container with NVIDIA runtime | ||
// Note: We use native snapshotter because overlay doesn't work in nested containers | ||
output, stderr, err = nestedContainerRunner.Run(` | ||
ctr run --rm --snapshotter native docker.io/library/busybox:latest test-nvidia echo "Hello from NVIDIA runtime" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this use the nvidia runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if no devices are requested, then the nvidia-runtime is a no-op.
|
||
// Pull a test image | ||
output, stderr, err := nestedContainerRunner.Run(` | ||
ctr image pull docker.io/library/busybox:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ctr automatically use the containerd config?
c65e468
to
ef6d766
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
When("configuring containerd on a Kubernetes node", func() { | ||
It("should add NVIDIA runtime using drop-in config without modifying the main config", func(ctx context.Context) { | ||
// Configure containerd using nvidia-ctk | ||
cmd := fmt.Sprintf(nvidiaCtkConfigureContainerdCmd, "--nvidia-set-as-default --cdi.enabled") |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag '--nvidia-set-as-default' is inconsistent with the flag used elsewhere ('--set-as-default=false') and is likely invalid in nvidia-ctk. Replace with '--set-as-default' to ensure the command executes correctly.
cmd := fmt.Sprintf(nvidiaCtkConfigureContainerdCmd, "--nvidia-set-as-default --cdi.enabled") | |
cmd := fmt.Sprintf(nvidiaCtkConfigureContainerdCmd, "--set-as-default --cdi.enabled") |
Copilot uses AI. Check for mistakes.
Expect(output).To(ContainSubstring(`kata`)) | ||
|
||
// Configure containerd using nvidia-ctk with nvidia as default | ||
cmd := fmt.Sprintf(nvidiaCtkConfigureContainerdCmd, "--nvidia-set-as-default --cdi.enabled") |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above: '--nvidia-set-as-default' should be '--set-as-default' to match nvidia-ctk CLI flags.
cmd := fmt.Sprintf(nvidiaCtkConfigureContainerdCmd, "--nvidia-set-as-default --cdi.enabled") | |
cmd := fmt.Sprintf(nvidiaCtkConfigureContainerdCmd, "--set-as-default --cdi.enabled") |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer --set-as-default
over --nvidia-set-as-default
.
// Verify NVIDIA runtime was added | ||
Expect(output).To(SatisfyAny( | ||
ContainSubstring(`[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]`), | ||
ContainSubstring(`[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.nvidia]`), |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second ContainSubstring uses single quotes around the plugin key; containerd config dump uses double quotes. Update to [plugins.\"io.containerd.cri.v1.runtime\".containerd.runtimes.nvidia]
to avoid false negatives when matching v3 config structure.
ContainSubstring(`[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.nvidia]`), | |
ContainSubstring(`[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.nvidia]`), |
Copilot uses AI. Check for mistakes.
Expect(err).ToNot(HaveOccurred(), "Failed to install toolkit for containerd") | ||
}) | ||
|
||
AfterAll(func(ctx context.Context) { |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If BeforeAll fails before initializing nestedContainerRunner, this AfterAll will panic by calling Run on a nil interface. Add a nil check guard (e.g., if nestedContainerRunner == nil { best-effort docker rm and return }) before using it.
AfterAll(func(ctx context.Context) { | |
AfterAll(func(ctx context.Context) { | |
// If nestedContainerRunner is nil, BeforeAll failed before initialization. | |
// Best-effort cleanup: remove the container and return. | |
if nestedContainerRunner == nil { | |
_, _, err := runner.Run(fmt.Sprintf("docker rm -f %s 2>/dev/null || true", containerName)) | |
if err != nil { | |
GinkgoWriter.Printf("Warning: failed to cleanup container %s: %v\n", containerName, err) | |
} | |
return | |
} |
Copilot uses AI. Check for mistakes.
|
||
# Restart containerd and wait for it to be ready | ||
systemctl restart containerd | ||
|
||
# Wait for containerd to be responsive (up to 30 seconds) | ||
for i in $(seq 1 30); do | ||
if ctr version >/dev/null 2>&1; then | ||
echo "containerd is ready" | ||
break | ||
fi | ||
if [ "$i" -eq 30 ]; then | ||
echo "ERROR: containerd failed to become ready" | ||
exit 1 | ||
fi | ||
sleep 1 | ||
done | ||
`) | ||
Expect(err).ToNot(HaveOccurred(), "Failed to reset containerd configuration") |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The restart-and-wait logic is duplicated here and again at lines 289–298. Consider extracting this into a small helper (e.g., restartContainerdAndWait(r Runner)) to reduce duplication and keep tests concise.
# Restart containerd and wait for it to be ready | |
systemctl restart containerd | |
# Wait for containerd to be responsive (up to 30 seconds) | |
for i in $(seq 1 30); do | |
if ctr version >/dev/null 2>&1; then | |
echo "containerd is ready" | |
break | |
fi | |
if [ "$i" -eq 30 ]; then | |
echo "ERROR: containerd failed to become ready" | |
exit 1 | |
fi | |
sleep 1 | |
done | |
`) | |
Expect(err).ToNot(HaveOccurred(), "Failed to reset containerd configuration") | |
`) | |
Expect(err).ToNot(HaveOccurred(), "Failed to reset containerd configuration") | |
// Restart containerd and wait for it to be ready | |
err = restartContainerdAndWait(nestedContainerRunner) | |
Expect(err).ToNot(HaveOccurred(), "Failed to restart and wait for containerd") |
Copilot uses AI. Check for mistakes.
nestedContainerCacheDir = strings.TrimSpace(tmpdir) | ||
|
||
// Prepare cache on the remote runner (where Docker is) | ||
installer, err := NewToolkitInstaller( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that we can't use the "globa" toolkitInstaller
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't thought on that, now simplified
|
||
# Backup custom.d directory if it exists | ||
if [ -d /etc/containerd/custom.d ]; then | ||
cp -r /etc/containerd/custom.d /tmp/containerd-custom.d.backup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The containerd source code does not seem to include custom.d
. Why are we copying this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was creating one during devel. not needed any more
|
||
# Backup config.d directory if it exists | ||
if [ -d /etc/containerd/config.d ]; then | ||
cp -r /etc/containerd/config.d /tmp/containerd-config.d.backup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path that is searched by default is /etc/containerd/conf.d
. See containerd/containerd#12323
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Backup the original config.toml | ||
if [ -f /etc/containerd/config.toml ]; then | ||
cp /etc/containerd/config.toml /tmp/containerd-config.toml.backup | ||
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a thought. Does using -f -r
as arguments for cp
for all of the things we want to backup make this (and the subsequent cleanup) simpler?
systemctl restart containerd || true | ||
`) | ||
if err != nil { | ||
GinkgoWriter.Printf("Warning: failed to restore containerd configuration: %v\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does GinkgoLogr
not have an actual Warning
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise just use GinkgoLogr.Error
here?
BeforeEach(func(ctx context.Context) { | ||
// Restore from backup to ensure clean state for each test | ||
_, _, err := nestedContainerRunner.Run(` | ||
# Restore the original config.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems strange to create the backup in BeforeAll
and then to restore the backup again directly in BeforeEach
. Why not restore the backup in AfterEach
instead? This would mean that we don't need the logic in AfterAll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, refactored
# Restart containerd and wait for it to be ready | ||
systemctl restart containerd | ||
|
||
# Wait for containerd to be responsive (up to 30 seconds) | ||
for i in $(seq 1 30); do | ||
if ctr version >/dev/null 2>&1; then | ||
echo "containerd is ready" | ||
break | ||
fi | ||
if [ "$i" -eq 30 ]; then | ||
echo "ERROR: containerd failed to become ready" | ||
exit 1 | ||
fi | ||
sleep 1 | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be logically separate from restoring the backups.
Expect(err).ToNot(HaveOccurred(), "Failed to reset containerd configuration") | ||
}) | ||
|
||
When("configuring containerd on a Kubernetes node", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We're not on a Kubernetes node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited
When("configuring containerd on a Kubernetes node", func() { | ||
It("should add NVIDIA runtime using drop-in config without modifying the main config", func(ctx context.Context) { | ||
// Configure containerd using nvidia-ctk | ||
cmd := fmt.Sprintf(nvidiaCtkConfigureContainerdCmd, "--nvidia-set-as-default --cdi.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's definitely just use string concatenation here. I would even go so far as to say that we should just duplicate the command every time we want to run it.
rm -rf /etc/containerd/config.d | ||
mkdir -p /etc/containerd/config.d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing and creating this folder? Does this not contradict us restoring the backups in BeforeEach
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, adopted
SystemdCgroup = true | ||
CustomOption = "custom-value" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are theses values relevant? Should we check that the applied config includes these settings for the nvidia
runtime?
// NVIDIA runtime should be added | ||
Expect(output).To(ContainSubstring(`nvidia`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very BROAD check.
// Custom runtime should be preserved with all its options | ||
Expect(output).To(MatchRegexp(`(?s)custom-runtime.*CustomOption.*custom-value`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what this checks. For example the SystemdCgroups
setting is not checked.
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
ef6d766
to
07e70a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
ContainSubstring(`[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]`), | ||
ContainSubstring(`[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.nvidia]`), | ||
)) | ||
Expect(output).To(ContainSubstring(`BinaryName = `)) // The NVIDIA container runtime binary |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is too weak and may produce false positives since any runtime's BinaryName line would satisfy it. Strengthen it to assert the NVIDIA runtime binary specifically, e.g., use a regex similar to other tests: Expect(output).To(MatchRegexp((?s)runtimes\\.nvidia\\].*options\\].*BinaryName.*nvidia-container-runtime
)).
Expect(output).To(ContainSubstring(`BinaryName = `)) // The NVIDIA container runtime binary | |
Expect(output).To(MatchRegexp(`(?s)runtimes\.nvidia\].*options\].*BinaryName.*nvidia-container-runtime`)) // The NVIDIA container runtime binary |
Copilot uses AI. Check for mistakes.
AfterEach(func(ctx context.Context) { | ||
// Step 1: Restore containerd configuration from backup | ||
_, _, err := nestedContainerRunner.Run(` | ||
# Restore the original conf.d | ||
if [ -d /tmp/containerd-conf.d.backup ]; then | ||
rm -rf /etc/containerd/conf.d | ||
cp -r /tmp/containerd-conf.d.backup /etc/containerd/conf.d | ||
else | ||
# If no backup exists, just clean up | ||
rm -rf /etc/containerd/conf.d | ||
mkdir -p /etc/containerd/conf.d | ||
fi | ||
`) |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only conf.d is restored; /etc/containerd/config.toml is left mutated across tests while the suite uses Ordered and ContinueOnFailure. A failure can leave broken state for subsequent tests. Consider also backing up and restoring /etc/containerd/config.toml (e.g., copy to /tmp/containerd-config.toml.backup in BeforeAll and restore here), or remove ContinueOnFailure to avoid cascading failures.
Copilot uses AI. Check for mistakes.
|
||
When("containerd has multiple custom runtimes and plugins configured", func() { | ||
It("should add NVIDIA runtime alongside existing runtimes like kata", func(ctx context.Context) { | ||
// Add some custom settings to the existing kindest/base config |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is misleading because tests explicitly share state (Ordered) and earlier specs overwrite /etc/containerd/config.toml. Either ensure a fresh baseline before appending (e.g., regenerate the default config) or update the comment to reflect that you are appending to the current config state from prior tests.
// Add some custom settings to the existing kindest/base config | |
// Add some custom settings to the current config state (which may have been modified by previous tests) |
Copilot uses AI. Check for mistakes.
Expect(output).To(SatisfyAny( | ||
ContainSubstring(`[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]`), | ||
ContainSubstring(`[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.nvidia]`), | ||
)) |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates the earlier runtime-section presence check in the same test (lines 127–130). Consider removing one of these blocks to reduce redundancy.
Expect(output).To(SatisfyAny( | |
ContainSubstring(`[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]`), | |
ContainSubstring(`[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.nvidia]`), | |
)) |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still some unanswered questions below.
Also, the intent of this PR is to test how the merged configs are handled by different containerd versions. I don't feel as if this really achieve this since we're only ever testing the version of containerd
that is included in the kind node image.
Can one parameterize the test for 1.7.x
and 2.0
so that we can validate the behavior there? (I'm happy to do this as a follow-up, but it would be good to understand how we would handle the expected differences).
# Backup the original conf.d directory | ||
if [ -d /etc/containerd/conf.d ]; then | ||
cp -r /etc/containerd/conf.d /tmp/containerd-conf.d.backup | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about /etc/containerd/config.toml
(if it exists)?
Expect(err).ToNot(HaveOccurred(), "Failed to restart containerd after configuration restore") | ||
}) | ||
|
||
When("configuring containerd on a KIND node", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why reference KIND at all? We only selected the kind
nodes since this was the simplest way to get containerd
running in a container.
When("configuring containerd on a KIND node", func() { | |
When("configuring containerd", func() { |
ContainSubstring(`[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]`), | ||
ContainSubstring(`[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.nvidia]`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As called out a number of times, this check is too broad. We should capture the differences between different containerd versions explicitly.
}) | ||
|
||
When("configuring containerd on a KIND node", func() { | ||
It("should add NVIDIA runtime using drop-in config without modifying the main config", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only true for containerd versions that INCLUDE the default /etc/containerd/conf.d
imports.
)) | ||
|
||
// Verify the drop-in config was processed (config dump shows actual imported files) | ||
Expect(output).To(ContainSubstring(`/etc/containerd/conf.d/*.toml`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please load the output config as toml
and check specific values instead of checking substrings.
// Verify KIND settings are still present | ||
Expect(output).To(ContainSubstring(`SystemdCgroup = true`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this a Kind setting? Does the kind node ALWAYS use systemd cgroups?
No description provided.